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^ Abstract 

lO ' Can we objectively distinguish chemical systems that are able to process meaningful information 
^ from those that are not suitable for information processing? Here, we present a formal method to 
assess the semantic capacity of a chemical reaction network. The semantic capacity of a network 
can be measured by analyzing the capability of the network to implement molecular codes. We 
analyzed models of real chemical systems (Martian atmosphere chemistry and various combustion 
Q ' chemistries), bio-chemical systems (gene expression, gene translation, and phosphorylation signaling 
. cascades), as well as an artificial chemistry and random networks. Our study suggests that different 
chemical systems posses different semantic capacities. Basically no semantic capacity was found 
in the atmosphere chemistry of Mars and all studied combustion chemistries, as well as in highly 
connected random networks, i.e., with these chemistries molecular codes cannot be implemented. 
K*" ' High semantic capacity was found in the bio-chemical systems, as well as in random networks where 
^ ■ the number of second order reactions is at the number of species. Hypotheses concern the origin 
O ! and evolution of life. We conclude that our approach can be applied to evaluate the information 
processing capabilities of a chemical system and may thus be a useful tool to understand the origin 
and evolution of meaningful information, e.g., at the origin of life. 
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Introduction 

In recent years great advances have been made in understanding the bio-chemical basis of biological 
information processing. For theoretical analysis of biological information Shannon's theory of com- 



munication [39| has been applied very successfully in various domains, like genomics [38|, bacterial 



quorum sensing [28|, or signaling in molecular systems [2J]. The mathematical theory of communi- 



cation focusses on uncertainty of events and intentionally neglects semantic aspects of information 



because ^Hhey are irrelevant for the engineering problem" (Shannon 39|], p. 1). However, in order 



to obtain a full understanding of biological information, studying also semantic as well as pragmatic 



aspects would be important, if not necessary |22|, |29|. Although syntax, semantics, and pragmatics 
are interdependent, as detailed in the Co^ approach 4^, we concentrate here on semantic aspects 
of molecular networks in order to keep our formalism and analysis clear and concise. 

In general, semantics refers to the relation between a sign and its meaning and can be described 
by a code. An example is the genetic code, which is a mapping between codons (signs) and amino 
acids (meanings) . An important property of this mapping or relation is its contingency, that is, 
the relation could be different and thus is not determined by the signs and meanings alone 0, 29|. 
We say that the relation between signs and meanings is contingent, if the relation cannot be derived 
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by applying natural laws to the signs and meanings alone. This implies that by natural laws we 
can only derive the relation by knowing in addition a context under which the signs are interpreted. 
Furthermore it implies that there is potentially another context under which the signs are interpreted 
differently. 

It is thus sometimes stressed that the relation between signs and meanings cannot be explained 



by physical laws [35|, like the natural laws do not help in understanding the written law or the 
grammar of a language. However, more often than not this notion of independence from natural 
laws is the cause for confusion. So, in order to properly use semiotic concepts in biology, we should 
provide a proper - ideally formal - link from these concepts to the realm of physics. To achieve 
this we take the following strategy: (Step 1) Select an experimentally grounded and reliable formal 
description of the targeted biological system. Here, we take the reaction network as this formal 
description. (Step 2) Provide precise, not necessarily formal, definition of the semiotic concepts 
that shall be applied to the system. Here, we take the notion of an organic code as reviewed by 
Barbieri (Step 3) Interpret these definitions by linking them to the formal description of the 
biological system. Here, this is done by our formal definition of a molecular code. With this a 
semiotic concept gets - at least partially - operationalized by means of physical experiments. In 
particular, it allows us to incorporate contingency in a formal model of molecular codes. 

To illustrate the basic idea of an explicit modeling of contingency we will briefly discuss an exam- 
ple reaction network, which exhibits a contingency. Figure [T]A. shows a reaction network containing 
eight molecular species {A, B, C, D, E, F, G, H} and four reactions. Here, we assume that the net- 
work contains all possible reactions that can appear when mixing these molecules. The network 
then is a complete model of the world, i.e., no species and reactions are missing that are physically 
possible. A mapping in a reaction network relates molecular species. Here, for example, {A} can 
be mapped on {C} by reaction A + E ^ E + C . {E} is necessary for the reaction to happen and 
thus we call it a molecular context. The network can implement a molecular code, if there exists a 
set of molecular species that can be mapped on a second set of molecular species in at least two 
different ways. In this example network the sets S = {A, B} and M = {C, D} fulfill this property. 
S (domain) maps on M {codomain) by applying the context {E,H}. No two elements of the do- 
main 5" map to the same element in the codomain M. There exist an alternative molecular context 
{F, G} which realizes a different mapping between domain and codomain. The existence of these 
two alternative mapping qualifies these as codes, since they emerge from a contingency. 



Materials and Methods 

In this section we provide a formal definition of a molecular code as a contingent mapping that 
can be realized by a reaction network, then we formally define the semantic capacity of a reaction 
network based on the number of molecular codes it can realize, and finally describe two algorithms 
for finding all molecular codes of a reaction network. 

Definition of Molecular Code 

A reaction network {M,TZ) is defined by a set of molecular species Ai and a set of reactions TZ 
occurring among the molecular species Ai. See Figure [T]A. for an example. For each reaction p e 7^, 
let LHS(p) and RHS(p) denote the set of reactant species (left hand side) and product species (right 
hand side) of reaction p, respectively. 

A subset of molecular species C C is called closed, iff the application of all possible reactions 
from TZ on C does only produce species from C, i.e., for all p G 7^ with LHS(p) C C: RHS(p) C C 



3 



A 




C D 




Figure 1. Two exemplary reaction networks containing molecular codes. Panel A: 
Chemical reaction network {Ai,TZ) with species Ai = {A, B, C, D, E, F, G, H} and reaction rules 
TZ = {A + E ^ C + E, . . .}] panel B: Code pair that can be realized by the network in panel A. 
The binary molecular codes are characterized by 5* = {A, B}, M = {C, D}, and the two 
codemakers C = {E,H}, and C = {F,G}] panel C: Chemical reaction network with species 
Ai = {/, J, K, L, M, N, M}. The two species labeled 'M' denote the same species; panel D: Two 
molecular code pairs can be realize by the network in panel C. 



10|. For any set of species A C there exists a smallest closed set Gci{A) containing A j40|. We 
say that Gci{A) is the closure of A. Intuitively, the closure of a set of species contains all those 
species that can be reached by arbitrary long reaction pathways among the species of that set. 

Definition 1: [molecular mapping] Given a reaction network N = {Ai,TZ) and two sets 
of molecular species S, M (1 JH, we say that f : S ^ M is a molecular mapping with respect to 
the reaction network N, iff there exist a set of species G Ai (called context), such that for each 
s,s' E S with s 7^ s' : /(s) G Gcl{G U {s}) and f{s') ^ Gcl{G U {s}). If there is a molecular 
mapping f with respect to N, we also say that N can realize the molecular mapping f . 

Note that in a reaction network there is usually more than one molecular context G that realizes 
a particular molecular mapping /. Intuitively, in order to "compute" f{s) with the reaction network 
N, we put all molecules from the context G together with s in a reaction vessel. Then we repeatedly 
apply all applicable reaction rules and add the products to the reaction vessel until no novel molecular 
species can be added anymore. Then we check which molecular species from M is present, which 
must be - according to our definition - unique and the result of /(s). 

Definition 2: [molecular code] Given a reaction network N = (A^,7^) and a non-constan^ 



mapping / : 5 — > Af is called non-constant, iff there exists s, s' S 5 such that /(s) ^ /(s'). 
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molecular mapping f : S M , with S, M C M. with respect to N, we call the mapping f a 
molecular code with respect to N, if all other mappings g : S M with the same domain S and 
codomain M like f can also be realized by the reaction network N. 

The definition catches the notion of contingency as described above, i.e., the elements of the 
domain can be mapped to the elements of the codomain in an arbitrary way by changing the 
context. In order to keep our study tractable, we will focus on molecular codes that are binary, i.e.. 



where S as well as M contain exactly two molecular species [12]. We will also not study molecular 
mappings that are only partially contingent, here. For binary molecular codes our definition can be 
reformulated more explicitly: 

Definition 3: [binary molecular code (BMC)] Given a reaction network {Ai,TZ) and two 
binary sets of molecular species S = {si,S2} ^ (md M = {mi,m2} C Ai. The mapping 
f : S M is called a binary molecular code, iff there exist two sets C C JH (called codemaker) 
and C ^ Ai (called alternative codemaker) such that the following conditions hold: 

f{si) G Gcl{{si} U C), and /(ss) ^ Gcl{{si} U C), and 
fis2) e GcLi{s2} U C), and /(si) ^ Gcl({s2} U C), and 
fis2) e GclUsi} U G'), and f{si) i Gcl{{sx} U C"), and 
/(si) G Gcl[{s2\ U C"), and /(sa) i Gcl{{s2\ U G'). 



As stated in Definition 3 each binary molecular code comes with a second code implementing 
a different mapping. The alternative code g is determined by g{^s\) = /(S2) and g{s2) = /(si)- 
K = {f,g) is called code pair. Two simple example networks are shown in Figures [T]\ and [Tp. 
Both networks appear to be very similar in their structure, but they show different number of codes. 
While the former network is capable to realize one code pair, the latter network - though being 
smaller - can realize two code pairs. 

Semantic Capacity 

Now we can measure the semantic capacity of a system as the system's capacity to realize contingent 
mappings. Concretely, we count how many different mappings a network can realize. Note that these 
codes need not be realized at the same time. Further note that we count each mapping only once, 
even if it can be realized by more than one codemaker. 

Because we study only binary codes here and binary codes always come in pairs, it is reasonable 
to count the number of different code pairs GP^ of a network A^. So, the semantic capacity is defined 
here as: 

SC{N) = GPn. (1) 

The number of code pairs can be high and can grow exponential with network size, such that we 
use the logarithm for comparing different network's semantic capacity. The logarithmic semantic 
capacity is defined as 

SCiogiN) = log2(l + SCiN)) = log2(l + GPn). (2) 

In SCiog we apply the transformation 1 + x to guarantee that the logarithmized semantic capacity 
is well defined and its smallest value is zero, in case the network cannot realize any molecular code. 

The mean semantic capacity of a group of n networks Ni [i = 1, . . . ,n) is calculated by the 
arithmetic mean of the respective measure (linear or log). 
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In future studies, the semantic capacity could be integrated with measures of the code's quahty, 



fitness, or cost (42|,|43|. E.g., two networks with the same number of code pairs could be differentiated 



with respect to the costs to implement those codes. 
Algorithmic Identification 

The closure-based definition of BMCs (Definition 3) allows us to develop algorithms for automatic 
detection of codes in reaction networks. The algorithm searches for a network pattern, i.e., a 
combination of molecular species and reactions, fulfilling the conditions stated above. 

Definition 3 leads to an algorithm that first calculates all closed sets and then checks combinations 
of closed sets for the BMC condition as stated in Definition 3. In particular for the two elements of 
the domain, and the two elements of the codomain the single molecular closed sets, i.e., the closed 
sets that are generated by a single molecular species alone {GcL{m),'m e Ai), are used. There exist 
at most \Ai\ single molecular closed sets. The closure-based algorithm has a worst-case running 
time complexity of 0{\Ai\'^nl) with ric as number of all closed sets contained in the system (cf. 
Supplement Text SI). 

Alternatively we can analyze a network in terms of pathways. This makes sense since signs 
and meanings need always be connected by paths of reactions. The running time complexity of the 
pathway-based algorithm depends on the number of paths the network contains. For the identification 
of BMCs all s-t-paths for all pairs of species are identified. Any combination of four paths is checked 
for the BMC condition. Since the number of paths in a network grows enormously with the density 



of the network we apply a parameterized algorithm that uses only the /c-shortest paths j27| between 
every pair of species. The worst case running time then is bounded by 0{\A4\'^k^). If k is chosen too 
small the algorithm is not able to find all codes in the system, but gives an approximate measure. 
Pseudo-code for both algorithms and subroutines is given in the supplement (Supplement Text SI). 

The different running time complexities suggests a conditional application of the algorithms. 
The pathway-based algorithm can be efficiently applied on networks that have a high number of 
closed sets and a low number of paths, while the closure-based algorithm can be applied in the 
other case, where the number of paths is high and the number of closed sets in the network is low. 
Interestingly, systems with high semantic capacity tend to have both, high number of closed sets and 
many pathways (see below), so that an algorithmic challenge remains for analyzing such systems 
more efficiently. 



Results 

In the following we survey different kinds of systems for their semantic capacity by the application 
of the algorithms described above. The analyzed systems are the gene translation chemistry (GC), 
gene regulatory networks (CRN), phosphorylation cascades (PC), combustion chemistries (CC), the 
martian atmosphere chemistry, and random networks. A summary of the analysis of the biological 
systems (CRN, GC, PC) is given in Table [71 while all systems are compared in Table [HI To apply our 
algorithms we had to construct the reaction networks for some of these systems first. To accomplish 
this we followed a knowledge-based approach for GC, CRN, PC. 
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Biological Reaction Networks 
Gene Translation 

We will show now that the gene translation chemistry can realize molecular codes. In particular, this 
suggests that the genetic code, i.e., the mapping describing the translation from nucleotide triplets 
to amino acids, is a molecular code. The fact that there is more than one genetic code is known for 
a long time 17, 34|- Here, we analyze the 17 already known genetic codes, as listed at NCBI joj. 



The different genetic codes cover nuclear and non-nuclear codes of different genera, e.g., bacterial, 
archaeal, and plant plastid codes, the vertebrate, invertebrate, and yeast mitochondrial codes, and 
the alternative yeast nuclear code. In particular, we construct a reaction network containing the 
codons, the amino acids, and the specific tRNAs, which are necessary for the translation. For all 
mappings between DNA triplets and amino acids occurring in the 17 codes we add a reaction in the 
network of the form codon + tRNA — )■ amino acid. The obtained reaction network (Supplement 
Network S2) represents a merge of the 17 genetic codes and contains 234 molecular species and 85 
reactions. 

Table 1. BMCs found in the merge of the 17 known genetic codes. Here the 16 found 
BMCs are summarized. If applicable BMCs are grouped. See supplement Text S3 for the code 
pairs. 



sign (codons) 


meanings (amino acids) 


#BMC 


References 


CTT, CTG, CTA, CTC 


L, T 


6 


[6,3^ 


AGG, AGA 


G,S,R, Stop 


6 


[3 , 4, 8 , 11, 14-16 , 20 , 32-34, 41 , 46] 


AGG, TCA 


S, Stop 


1 


[3, 4, 15, 31 , 33, 3^ 


AGA, TCA 


S, Stop 


1 


[3, 4, 15, 31 , 33, 3^ 


TTA, TAG 


L, Stop 


1 


[9, 13, 23 , 31, 3^ 


TAA, TAG 


Q, Stop 


1 


[19 , 25, 34, 36, 37] 



References - Articles reporting the respective alternatives in the genetic code that are part of a BMC in this analysis. 

The algorithmic analysis of this network identified 16 binary molecular codes (Supplement Text 
S3), i.e., a semantic capacity of SCiog = 4.09. The binary codes can partly be assigned to larger 
molecular codes. CTT, CTG, CTA, and CTC can be mapped on leucin (L) and threonin (T) and 
give rise to six of the found BMCs. The second group involves the mapping between AGG, AGA 
and glycin (G), serine (S), arginine (R) and the translation stop. This code can also be decomposed 
into six BMCs. There does exist four more BMCs that involve the codons TCA, TTA, TAG and 
TAA and the amino acids leucine (L), glutamine (Q) and the stop signal. Table [1] summarizes the 
BMCs found. The existence of alternative mappings in the genetic translation system suggests that 
the genetic code qualifies molecular code. 

We may model the genetic code now by including all potential mappings between codons and 
amino acids, i.e., the model includes all possible tRNAs such that any codon could be read for any 
amino acid (Supplement Network S3). In such a system the number of binary molecular codes can 
easily be calculated. Each pair of codons forms a code pair with each pair or amino acids. Since 
there exist (^2^) pairs of triplets and (2") pairs of amino acids the number of BMCs in this potential 
set up is 

g. (5 ^383,040. 

The logarithmic semantic capacity is ~ 18.55. 

In the following, we refine the network model by constructing a reaction network containing all 
possible mappings between the 64 codons and 20 amino acids like described above. Additionally 
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Code pair 2 

C = {AGU; Ser-free: Syn_AGU,Ser; tRNA_GGA,Gly} 

CCA > Cly-prot 

tRNA AGU > Ser-prot 

C = {AGU, Gly-free; Syn_AGU,Gly: tRNA GGA.Ser} 
CCA Cly-prot 
tRNA AGU Ser-prot 



Code pair 4 

C = {GGA: AGU; Gly-free; Ser-free; Syn_GGA,Gly: 
SynAGU.Ser} 

tRNA GGA > Cly-prot 

tRNA_AGU > Ser-prot 

C = {GGA; AGU; Gly-free; Ser-free; Syn_GGA,Ser; 

Syn_AGU,Gly> 

tRNA CCA ^ .^Gly-prot 

tRNAACU — ""^^'^t^ Ser-prot 



Figure 2. Subnetwork of the full gene translation network model with synthetases 

{Noc)- The network (panel A) shows a subnetwork of the gene translation network model 
containing the translation, and loading reactions for two selected codons (GGA,AGU) and amino 
acids (Gly,Ser). The semantic analysis shows that four code pairs can be implemented by this 
network (panel B). 

we model the loading step of the tRNAs by inserting the respective amino-acyl-tRNA-synthetases 
(aaRS) (cf. FigureEj Supplement Network S5). The reaction network Nqc = {-MgcT^gc) describes 
the core molecular mechanism realizing the standard genetic code and all alternative codes. The set 
of molecular species A4gc of the network contains all DNA strings of length three (Table [21 Eq. 2), 
representing the codons. It contains the twenty proteinogenic amino acids in their free form (Table 
El Eq. 3) and the twenty amino acids bound in a protein (Table El Eq. 4). To describe the system 
properly we also need to insert species for all possible tRNAs in their unloaded (Table El Eq. 5) and 
loaded form (Table El Eq. 6). In the unloaded form we represent specificity to codon n with n as 
subscript, in the loaded form we represent specificity to n with a subscript n and the loaded amino 
acid a with a subscript index a, i.e., a tRNA that is loaded with Ser and has specificity to AGU 
is denoted as tRNAAGU,Ser- The network also contains all possible aaRS (Table El Eq. 7), Synn,a, 
such that the system is able to load all amino acids to all tRNAs. The specificity of the aaRS to 
certain combinations of codons and amino acids is represented by two subscript indices n, a, with n 
representing the respective codon and a representing a certain amino acid. The set TZgc contains 
all reactions loading the amino acids onto the tRNAs (Table El Eq- 8) and all reactions inserting 
an amino acid in the peptide sequence (Table El Eq- 9). Figure El^ displays a subnetwork with two 
codons (GGA,AGU), two amino acids (Gly,Ser) and the respective other elements of the network 
(tRNA and synthethases) . 

The analysis of this extended network (Ngg) describing all potential genetic codes with 64 
codons and 20 amino acids results in 1,532, 160 binary code pairs, i.e., SCiog{NGc) ~ 20.55. This 
is a different result than for the less detailed model, as calculated by Eq. The extension of 

the model by aaRS, unloaded tRNAs, and unloaded amino acids increases the semantic capacity. A 
closer look to the resulting codes shows that not only the codons can be signs, but also the unloaded 
tRNAs {tRN A^^^^) can function as signs. These additional signs increase the number of code pairs. 
The "new" codes differ structurally in their codemakers. While, classically, the codons are mapped 
to the set of amino acids [AA^"^"^) using the loaded tRNAs {tRN A^°"-'^^'^) as codemakers, the new 
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Table 2. Reaction network formulation of a gene translation system with synthetases 

Eq. Definition Description 



1 

2 

3 
4 

5 
6 



Mac = Codons U AA^^^^ U ^^p™* U aaRS U tRNA-^'''"' U 

Codons = {A, C, G, T}^ = {AAA, AAC, TTT] 

AAfree ^ {Ala^ '''''' , Argf AspJ^''^'', Try f 
j^j^prot ^ {AlaP''°\ArgP'''>\AspP'"°\ TryP"'"^} 

tRNAf^^"" = {tRNAn\n G Codons} 

tRNA^"'"^'"^ = {tRNAn,a\n £ Codons, a G AAjree} 



7 aaRS = {Synn^aln G Codons, a G AAj^e^} 

8 TZgc = {tRNAn + a + Synn,a tRNAa^n + Synn,a I n G 
Codons,ae yl^^'~^^}U 

9 {n + tRNAa,n -^n + tRNAn + a\ n€ Codons, a G AAP"""^} 



Definition of the molecular species in the 
network 

Set representing the 64 codons of the ge- 
netic code 

Amino acids that are not used in a protein 
Amino acids that have been used in a pro- 
tein during gene translation 
Unloaded tRNAs specific for codon n 
tRNAs specific for codon n that have been 
loaded with amino acid a 
Amino-acyl-tRNA-synthetases that are 
specific for amino acid a and codon n 
Loading of the tRNA by suitable syntheta- 
sis 

Translation step, i.e., the incorporation of 
an amino acid into a growing protein 



signs, i.e., unloaded tRNAs, are mapped to the set of amino acids by using a codemaker that consists 
of the free amino acid loaded to the free tRNA, the synthetase performing the loading step, and the 
codon that needs to be recognized by the tRNA. The number of code pairs in this system can be 
calculated by 




with Ug as number of signs and Um as number of meanings. For the full gene translation system the 
number of signs is Ug = \Codons\ + \tRNAs-^^'^'^\ and = Since there is always one pair 

of one tRNA and codon belonging together, which therefore can not be combined in an BMC, we 
have to subtract the number of such pairs ns/2 from the amount of all combinations. 

Table 3. Code pairs realized by the subsystem of the gene translation network with 
synthetases shown in Figure [2]. 

Signs Meanings 

Code pair 1 {GGA,AGU} {G/y^™', S-erf™*} 

Code pair 2 {GGA,tRNAAGu} {G/y^™*, S-er^™*} 

Code pair 3 {AGU,tRNAGGA} {G/y^™*, S-er^™*} 

Code pair 4 {tRNAgaA , tRNAAou } { GlyP''"^ , SerP"-"* } 

Table 4. Codemakers of the code pairs shown in Table [31 

Code pair Codemaker alternative Codemaker 

~ {tRNAGGA,Gly, tRNAAGU,Ser] {tRN AagU.GIv. tRN AGGA,Ser} 

2 {AGU, Serf^^^, SynAGU,Ser,tRN AGGA,Giy} {AGU, Glyf^^^, SynAcu.Giy, tRNAoGA^Ser} 

3 {GGA, SerS-^^, SyncGA^Ser, tRNAAGU.Giy} {GGA, Glyf^^'^, SynGGA,Giy, tRNAAGU,Ser} 

4 {GGA, AGU, Glyf^'^, Ser^^^^, SynGGA,Giy, {GGA, AGU, Glyf^^', Serf^^^, SynGGA,Ser, 

SynAGU,Ser} SyUAGU.Gly} 
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Gene Regulatory Networks 

A gene regulatory network (GRN) is a graph representing the regulation of gene expression of certain 
genes by the expression of other genes. A node in a GRN stands for a complex process. It represents 
the gene, the promoter and binding region of that gene, the binding of the transcription factor (TF) 
plus cofactors and the production of a product by the recruitment of the gene expression machinery. 

We will show here that the GRN of a cell is also a highly semantic system. Different sources 
of signals are integrated into a decision which gene is transcribed and which is repressed. Each of 
these signals has another meaning which emerges out of the contingency of the system. The system 
is contingent, because a mapping between signal and gene product can be altered by the exchange 
of a promoter region of a gene (or vice versa). This may happen enzymatically by the application 
of nucleases and ligases or by mutations. 

To identify the semantic capacity we describe a gene regulatory network as a reaction network 
grn,T^grn)- M-grn contains n transcription factors TFj, m products Pj, and genes 
Gij. Each gene Gij represents a combination of a promoter site i and a coding region j, where the 
promoter site i is specific to TFi and the coding region j produces Pj. For our model we assume 
that there exist as many promoter sites and coding regions as transcription factors and products, 
respectively, such that any gene is possible. The set of all species J^grn then is 

M GRN = { TFi, TF2, . . . , TFi, ■ ■ ■ , TFn, -Pi, P2, ■ . . , 
Pjy . . . , Pmi G^ll7 ^125 • • • ) ^iji • • • ! ^nm}' 

Assuming that a transcription factor only binds one promoter and that a promoter is bound only 
by one by one transcription factor the expression of a gene i,j is given by 

TZgrn = {TF, + Gij ^ TFi + Gij + P,} , z = 1, 2, . . . , n, 

j = 1,2, . . . ,m. 



Figure E] illustrates the network definition. We here do not present a generic model to describe 
all possible gene regulatory networks, but a model that covers the main properties of regulation 
important for this study. The analysis of this system shows that the reaction network can implement 
molecular codes only in one way, i.e., with the transcription factors as signs and the set of products 
as meanings. The set of genes, i.e., the combination of promoter and coding region, forms the 
codemaker, because it allows for the contingent implementation of mappings between signs and 
meanings. Thus, in contrast to the model of the gene translation chemistry described above, the 
DNA is not the sign, but functions as the codemaker, i.e., it carries the contingency of the system. 
This shows that a code based analysis can only be done with regards to systems and not to single 
molecular species. 



Signaling Networks: Phosphorylation Cascades 



Cells maintain signaling systems of different kinds for signal transmission and integration |2l|. The 



most prominent signaling systems rely on reversible phosphorylation of amino acids side-chains for 
regulation of signaling protein activity. The direct involvement of such systems in signaling suggest 
that they may be also semantic systems. If so they should be able to realize molecular codes. 
We have studied phosphorylation cascades, like the MAP kinase regulatory network, as a typical 
instance of an intra-cellular signaling system. These systems demonstrate the limitation of our 
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sign 

Transcription Factor 



meaning 
Gene Product 




Transcription Factor - TF 
< ^ S /^"a — ^® Gene Product - P 



transcription and translation 



■n-A^.r..n.T.n,c.T^nAr,TATr.aJ.>TTAGA TCGTGTAGTGGTACg 

Promoter Region Coding Region y codemaker 



Gene - G 

TF + G — ► TF + G + P 



Gene 



B 



TF1 



Gil 



TF2 



G22 




G12 



M = {TF1, TF2, PI, P2, Gil, G22, G12, G21} 

R = {TF1 +G11 ^TFI +G11 + P1, TF2 + G22 ^ TF2 + G22 + P2, 
TF1 + G12 ^ TF1 + G12 + P2, TF2 + G21 TF2 + G21 + PI} 



Figure 3. Gene regulatory network model. Panel A: (left) Simple model of the expression of 
a gene, (right) reaction network formulation of the same process. Boxes in Panel A indicate the 
semantic interpretation, i.e., the transcription factors are the signs, the products are the meanings, 
and the DNA is the codemaker. Panel B: reaction network constructed according to the 
formalization of gene regulation shown in (A). 



closure-based approach. The static approaches described above are not sufficient to detect the codes 
in a phosphorylation cascade. A more refined approach is necessary that distinguishes between 
concentrations. It can be derived from our definitions here in a straight forward way. 

Applying this more refined approach, i.e., taking concentrations into account, we can see that 
the activation of a kinase by phosphorylation can generate a molecular mapping, but this mapping 
is not necessarily a molecular code (Figure |4]\). A two-step cascade is able to implement a molecular 
code (Figure HlB). 

The simple one-step phosphorylation model (Figure HJA.) contains two kinases; an initial kinase 
(S) which is able to phosphorylate the target kinase {S^ + A ^ ^^)- We also model the dephos- 
phorylation {A^ — )■ A). For sake of simplicity we do not model the phosphatases, and the phosphate 
related molecular species (e.g., ATP, ADP, P) involved in the process, but assume a buffered con- 
centration. In the simple, one step, model we may observe a molecular mapping between and the 
two states of kinase A. If has a low concentration the system is in a state where the unphospho- 
rylated state A has a high concentration and the phosphorylated state A^ has a low concentration. 
According to the definition of molecular code given above the system should be able to change the 
mapping, i.e., be contingent, by the application of a different molecular context to realize a code. 
Here, no alternative mapping between S and A can be realized, such that the system is not able to 
realize a molecular code. 

If we consider a different system where two kinases are inbetween S and A, we obtain a two-step 
phosphorylation cascade. now phosphorylates the inserted species, while these have an effect 
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[SP] low 
[SP] higin 
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[A] high / [aP] iow 
[A] low / [aP] high 



B 




C = {B, b''} 



input (sign) 



output (meaning) 



[S^] low 
[S^] high 



[A] iow / [A^] high 
[A] high / [A^] iow 



C = {C, c''} 



input (sign) 



output (meaning) 



[SP] low 
[S''] high 



[A] high / [aP] iow 
[A] low / [aP] high 



Figure 4. Reaction networks describing phosphorylation motifs. Molecular species in 
these networks represent kinases or phosphatases that may be activated or inactivated. Activated 
and non-activated forms of a kinase/phosphatase are modelled as different species (e.g., species 
A/A^). Panel A: (left) Reaction network of a simple phosphorylation motif, which can realize a 
molecular mapping (right), but not a molecular code; panel B: (left) more complex reaction 
network that can realize a molecular code. The molecular code is not only specified by the species, 
but also by their concentrations. 

on A. Now the system has the possibility to "choose" between two alternative systems, i.e., the 
inserted species may be "active" in the unphosphorylated state [B), or in the phosphorylated state 
(C). There exist several mappings in such a system, e.g., between 5* and B, S and C, and S and 
A. The former two mappings behave like the simple model (see above). The mapping between S 
and A is a molecular code, because the molecular context of the system can be changed, such that 
the alternative system behavior is generated (see Figure |1]B (right)). The codemaker of the code 
between S and A is either the 5-system, or the C-system. 

Random reaction networks as null model for code identification 

To check whether the motif describing a BMC can be generated by chance we analyzed random 
reaction networks of different sizes and densities for their semantic capacity. The networks have 
been generated by random insertion of reaction rules in the empty network. Each random reaction 
rule is bimolecular, i.e., contains two reactants, and one product. 

The analysis showed that the binary code motif can be generated in random networks (see Figure 
[5]), i.e., contingent mappings can be generated randomly. For a fixed network size and varying 
densities the average semantic capacity shows a unimodal behavior, which suggests that there exist 
an optimal range of densities for each network size, leading to maximal semantic capacity. This 
optimal range shifts to higher densities with increasing size of the network (see Figure [6]). The 
optimal interval is bounded on the left (lower densities) by the low complexity of the network, there 
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Figure 5. Structural properties of random reaction networks. Panel A: Number of paths 
of the respective combination of species and reactions. Panel B: Number of closed sets. Panel C: 
Semantic capacity. Panels A,B and C show three important network parameters for five different 
network sizes and various numbers of reaction rules. Each data point represents the average about 
random replicates. Error bars indicate the standard error of the mean. Panel A shows the average 
number of paths in the network. Since we applied the path algorithm which only uses the 
k-shortest paths between each pair of molecular species the curve shows a sigmoidal behavior, 
which is saturated at the value \^A \ ■ \ — 1) ■ k, with k = 10 . Panel B shows the average 
number of closed sets. With growing density the number of closed sets decreases. Panel C shows 
the distributions of the average number of code pairs (log measure with basis 2) in random 
networks of different sizes. The semantic capacity shows an unimodal distribution, which 
correlates with the other two shown network parameters. If the number of paths is too low no 
mappings can be implemented because of the missing links. If the number of closed sets is too low 
no unique mappings can be implemented. 
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are not enough reactions to promote the insertion of molecular by chance. On higher densities the 
network is strongly connected, such that it is harder to obtained closed sets, and therefore it is also 
harder to implement codes by chance. The optimal interval coincides with two important network 
properties, i.e., the number of paths, and the number of closed sets. With increasing network density 
the number of paths grows, while the number of closed sets decreases. High semantic capacity can 
be found in networks with a high number of pathways and at the same time a high number of closed 
sets. 




1 \ I \ \ \ r 
8 10 12 14 16 18 20 

Molecular species 



Figure 6. Scatter plot of the positions (Molecular Species, Reactions) of the maximal 
semantic capacity of the unimodal distributions for the random networks analyzed in 
this study. The linear regression results in the function: 
Reactions = —3.06 + 1.89 • Molecular species. 



Non-Biological Reaction Networks 

Combustion Chemistries 

We analyzed a number of chemical systems, i.e., combustion chemistries of hydrogen 0], methane 
45|, ethanol 26|, dimethyl ether 18|. The original combustion chemistry data (provided in 
CHEMKIN format 1^) have been processed to obtain the reaction networks jdescribing the re- 
spective chemistry. The chemistries are intended to describe all significant processes that can occur 
in the combustion, i.e., burning, of the respective molecule. In the CHEMKIN files most of the 
reactions are described as equilibrium reaction with additional thermodynamic parameter. Taking 
these as basis we obtain reaction networks (see definition above) containing the directed reactions 
depending on the thermodynamic parameters. The obtained reaction networks (Supplement S7) 
vary in their size (10 - 79 molecular species) and density (38 - 752 reactions). We found that none 
of the chemistries is able to realize molecular codes. 

We also analyzed the atmosphere chemistry of Mars [i^] to check whether other kinds of non- 
biological chemistries may contain codes. The atmosphere chemistry of mars contains 32 molecular 
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species, 104 reactions and 5512 closed sets. In particular the network describes the reaction hap- 
pening on the day side of mars. Therefore, light {hu) is modelled explicitly and there exist an inflow 
reaction for light. The Martian chemistry also is not able to realize molecular codes. 

Here, we compare the obtained results with random reaction networks of same size and density. 
The original hydrogen chemistry could not realize molecular codes. This may be due to the small 
number of closed sets compared to the number of paths, such that the molecular species are "too 
connected" and the network is less structured. In random networks of same size and density no 
molecular code can be identified. The estimated number of closed sets and paths, although differing 
from the from the original chemistry, are also marking that the networks are not in the optimal 
interval (compare section ^). 

In the methane combustion chemistry we see that there exist far more paths than closed sets, 
such that the network is "unstructured". The according null-model, here, also contains a high 
number of paths, but also a higher number of closed sets. The algorithmic analysis showed that 
some null-model networks can realize BMCs, such that the average semantic capacity is SCiog = 1.04. 
Nevertheless we consider this also as a very low semantic capacity compared with, e.g., the gene 
translation chemistry. For the other two combustion chemistries (ETH, DME) and the Martian 
atmosphere chemistry (MARS) the analysis of the random networks is not feasible with our current 
algorithms, due to the large number of paths and closed sets in these networks. 

Table 5. Comparison of combustion chemistries and random networks (null model). 

Combustion chemistry properties Null model estimate 

\M\ \TZ\ #closed sets #paths SCiog est. #closed sets (s.e.) est. #paths (s.e.) est. SCiog (s.e.) 
HYD 10 38 16 7.69 • 10^ 39.8 (0.53) 878.2 (1.27) OlM) 

MET 37 340 4,136 > 10^* 6,521.83 (353.63) > 10^^* 1.09 (0.15) 

ETH 57 752 5,136 > 10^* > 10^* > 10^* n.a. 

DME 79 708 8 > 10^* n.a. > 10^* n.a. 

n.a. - not available, s.e. - standard error of the mean, *estimated by performing runs on several networks (or growing 
values of k regarding #paths) where not all runs completed due to computational complexity, such that the maximal 
found value gives the estimate. 



Artificial Chemistry NTOP 

Recall that with increasing density random networks have a vanishing semantic capacity. In the 
following we show that even a dense network can have a relatively high semantic capacity. For this 
purpose we analyze an artificial chemistry with 16-species introduced by Banzhaf Banzhaf [l| called 
NTOP. For each species there is a 4-bit binary representation and the reaction rules are derived 
with respect to this representation, which is referred to as a structure-to-function mapping (see [l[ 
for details and Supplement S8 for the full network model). 

The algorithmic analysis results in six code pairs (for an overview see Figure [7] and Supplement 
Text S9). One of the code pairs with its respective codemaker is shown in Table El Figure [7] 
illustrates two properties of molecular codes, (1) a meaning can take the role of a sign in another 
code, and (2) molecular species can function as signs (or meanings) in different codes, i.e., they keep 
their role in different contexts. Both properties reflect the context dependency of codes, i.e., the 
molecular species constituting the code depend on the molecular context, the codemaker. 

To test the robustness of the network's ability to realize molecular codes, we randomized it by 
replacing 1, 2, 5, 10, 15, 200, and 1000 reaction rules randomly. The number of educts and products 
for each individual reaction is kept constant, only the molecular species are replaced. Increased 
randomization result in a clear decline of the average semantic capacity. Nevertheless in some cases 



15 




Figure 7. Code pairs found in the artificial chemistry NTOP. An edge connects a sign 
with a meaning if they occur in the same code (indicated by colors). Species belonging to 
codemakers have been omitted. Note that a sign can be used in different codes and that a meaning 
might be used as sign in another code (e.g., 10 in {10,15}). A vertex represents the closed set 
generated by the respective species alone. 

Table 6. Binary molecular code from the NTOP chemistry. Species are indicated by 
index. The first line indicates the respective species. The second line contains the closed sets 
generated by the species alone and the closed sets that form the codemaker. 

sign 1 sign 2 meaning 1 meaning 2 codemaker alternative codemaker 
species 2 13 3 8 - 
closed sets {1,2} {13,15} {1,3} {8} {0,5,10,15} {0,1,9} 



the randomized network is capable to implement more code pairs. The average trend, i.e., loss of 
code pairs, can be explained by referring to random reaction networks. Random reaction networks 
with the same number of species and reactions as NTOP also have a very low semantic capacity 
{SCiog = 0). Thus the randomization of the NTOP chemistry drives the system towards the mean 
semantic capacity of random networks. 



Conclusion 

We introduced a formal criterion for identifying molecular codes in reaction networks and a measure 
of the semantic capacity of a network, as the number of different code pairs the network can realize. 
Our notion of contingency, defined as the ability of systems to choose between different mappings, 
extends the notion of "independence" used by Barbieri. 

Applying the new concepts to different networks, our basic finding is that the semantic capacity 
of biological networks tends to be higher than that of the studied non-biological networks. Thus, 
an important step during the transition from non-life to life must have been the utilization of a 
chemistry that allows to implement molecular codes. In our opinion it is an open issue how that 
first coding chemistry has looked like. But we have now a criterion that can guide us in what we 
have to look for. 

Moreover we can now precisely formulate another hypothesis, namely, that during the course 
of evolution the semantic capacity of the chemistry employed by the biological systems has a ten- 
dency to increase, though not necessarily monotonously. One candidate mechanism is the invention 
and improvement of compositional adaptors, like proteins with exchangeable domains jsj or genes 
including their promoter- and coding- regions jij. Note that also the appearance and evolution of 
neurons and cognitive systems is in line with the hypothesis of increasing semantic capacity. 
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Figure 8. Randomization of NTOP. Boxplot showing the SC-distribution of the randomized 
NTOP chemistry. We can observe that on average randomization destroys the semantic capacity 
(non-logarithmized) of the network. The boxplots show the distribution of the number of code 
pairs after randomization of NTOP [n = 100) 

Table 7. Overview of semiotic interpretation of the biological systems surveyed in this 
paper. 

role gene regulatory codes genetic codes phosphorylation cascade codes 

signs transcription factor codon or unloaded tRNA high concentration of a kinase or 

phosphatase 

meanings gene product amino acid high/low concentration of a tar- 

get molecule 

codemakcr DNA with promoter and coding loaded tRNA or mixture of a kinase or phosphorylases 
region loaded tRNA, aaRS, and codons 



The analysis of a formalization of the genetic code showed that not only the codons are signs for 
an amino acid, but also tRNAs could be signs. The bio-molecular and evolutionary interpretation 
of this fact should be left for future studies. Furthermore, we have shown that DNA not only can 
function as a sign but also as a codemaker, as the modeling of GRNs revealed. The mechanisms in 
gene regulatory systems and the observation that such systems are highly flexible (i.e. the mapping 
between TFs and products can easily be changed) leads to the conclusion that the chemistry of 
GRNs possesses also a high semantic capacity. 

The analysis of random networks of different sizes and densities results in a better understanding 
of the basal rate of code occurrence. We can observe that the distribution of BMCs is unimodal. 
Random networks with high semantic capacity show at the same time a high number of closures 
(which decreases with increasing network density) and high number of pathways (which decreases 
with decreasing network density). The analysis of an artificial chemistry showed that also in dense 
networks the semantic capacity can be high. We hypothesize that this was caused by the structure- 
to-function mapping applied in the artificial chemistry. 

Future work includes the formal integration of information theory and the integration of prag- 
matics. Furthermore we can extend this static algebraic approach to a continuous and dynamics 
approach. 

When we address the semantic aspects of biological information, terminology becomes a hot topic 
of discussion. Although we appreciate this discussion from a philosophical perspective, we believe 
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a pragmatic focus is necessary to obtain a stronger impact in life sciences. This pragmatic track 
of the study of meaning for biological information requires at least three ingredients: (1) (semi- 
)formal definitions, (2) algorithm, tools, and predictions, and (3) links to experimental data (i.e., 
the physical world). These three ingredients obviously interact with each other and should thus be 
studied together. 

Supporting Information 

Text - SI sl_pseudocode.pdf 

Pseudocode of all algorithms and subroutines used in the analysis of the data for this paper. 
Network - S2 s2_gc_merge.zip 

Network file in rea- and sbml-format containing the description the merge of the 17 genetic codes 
hsted at NCBI. 

Text - S3 s3_gc_merge_codes.pdf 

Text file containing the result of the computational analysis of the network from S2. 
Network - S4 s4_gcfull_64_20.zip 

Network file in rea- and sbml-format containing the description of the full gene translation chemistry 
(without synthetases). 

Network - S5 s5_gcsynth_Gly_Ser.zip 

Network file in rea- and sbml-format containing the description of a subnetwork (Gly,Ser) of the full 
gene translation chemistry (with synthetases). 

Text - S6 s6_gcsynth_Gly_Ser_codes.pdf 

Text file containing the result of the computational analysis of the network from S5. 
Network - S7 s7_non-biological-networks.zip 

Network files of the analyzed combustion chemistries and the Martian atmosphere chemistry. 

Network - S8 s8_ntop.zip 

Network files of the artificial chemistry NTOP. 

Text - S9 s9_ntop_codes.pdf 

Text file containing the result of the algorithmic analysis of NTOP. 

Acknowledgments 

We are grateful for many fruitful discussions with Stefan Artmann and his valuable comments and 
suggestions. We thank Marcel Hieckel for preparing the combustion chemistry data. We acknowledge 
funding by the DFG through the Jena School for Microbial Communication (JSMC). 



18 



References 

1. W. Banzhaf. Self-replicating sequences of binary numbers. Comput Math Appl, 26:1-8, 1993. 

2. M. Barbieri. Biosemiotics: a new understanding of life. Naturwissenschaften, 95 (7): 577-599, 
2008. 

3. B. Batuecas, R. Garesse, M. Calleja, J. R. Valverde, and R. Marco. Genome organization of 
Artemia mitochondrial DNA. Nucleic Acids Res, 16(14A):6515-6529, 1988. 

4. J. L. Boore and W. M. Brown. Complete DNA sequence of the mitochondrial genome of the 
black chiton, Katharina tunicata. Genetics, 138(2):423-443, 1994. 

5. E. Bornberg-Bauer, A. K. Huylmans, and T. Sikosek. How do new proteins arise? Curr Opin 
Struct Biol, 20:390-396, 2010. 

6. G. D. Clark-Walker and G. F. Weiller. The structure of the small mitochondrial DNA of 
Kluyveromyces thermotolerans is likely to reflect the ancestral gene order in fungi. J Mol 
Evol, 38(6):593-601, 1994. 

7. M. O. Conaire, H. J. Curran, J. M. Simmie, W. J. Pitz, and C. Westbrook. A comprehensive 
modeling study of hydrogen oxidation. Int J Chem Kinet, 36(ll):603-622, 2004. 

8. G. A. Durrheim, V. A. Corfield, E. H. Harley, and M. H. Ricketts. Nucleotide sequence of 
cytochrome oxidase (subunit III) from the mitochondrion of the tunicate Pyura stolonifera: 
evidence that AGR encodes glycine. Nucleic Acids Res, 21(15):3587-3588, 1993. 

9. A. Elzanowski and J. Ostell. The genetic code, 2010. 
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi, version 3.9, July 07, 2010, 
retrieved: Feb 20, 2011. 

10. W. Fontana and L. Buss. The arrival of the fittest: Toward a theory of biological organization. 
Bulletin of Mathematical Biology, 56:1-64, 1994. 

11. J. R. Carey and D. R. Wolstenholme. Platyhelminth mitochondrial DNA: evidence for early 
evolutionary origin of a tRNA(serAGN) that contains a dihydrouridine arm replacement loop, 
and of serine-specifying AGA and AGG codons. J Mol Evol, 28(5):374-387, 1989. 

12. D. Gorlich and P. Dittrich. Identifying molecular organic codes in reaction networks. In 
G. Kampis and E. Szathmary, editors, 10th European Conference on Artificial Life, volume 
5777 of LNCS, pages 305-312. Springer, 2011. Sep 13 - Sep 16, 2009, Budapest. 

13. Y. Hayashi-Ishimaru, T. Ohama, Y. Kawatsu, K. Nakamura, and S. Osawa. UAG is a sense 
codon in several chlorophycean mitochondria. Curr Cenet, 30(l):29-33, 1996. 

14. H. Himeno, H. Masaki, T. Kawai, T. Ohta, I. Kumagai, K. Miura, and K. Watanabe. Unusual 
genetic codes and a novel gene structure for tRNA(AGYSer) in starfish mitochondrial DNA. 
Gene, 56(2-3):219-230, 1987. 

15. R. J. Hoffmann, J. L. Boore, and W. M. Brown. A novel mitochondrial genome organization 
for the blue mussel, Mytilus edulis. Genetics, 131(2):397-412, 1992. 



19 



16. H. T. Jacobs, D. J. Elliott, V. B. Math, and A. Farquharson. Nucleotide sequence and gene 
organization of sea urchin mitochondrial DNA. J Mol Biol, 202(2):185-217, 1988. 

17. T. H. Jukes and S. Osawa. Evolutionary changes in the genetic code. Comp Biochem Physiol 
B, 106(3) :489-494, 1993. 

18. E. Kaiser, T. Wallington, M. D. Hurley, J. Platz, H. J. Curran, W. J. Pitz, and C. K. 
Westbrook. Experimental and modeling study of premixed atmospheric-pressure dimethyl 
ether-air flames. Journal of Physical Chemistry, 104(35):8194-8206, 2000. 

19. P. J. Keeling and W. F. Doolittle. A non-canonical genetic code in an early diverging eukary- 
otic lineage. EMBO J, 15(9):2285-2290, 1996. 

20. A. Kondow, T. Suzuki, S. Yokobori, T. Ueda, and K. Watanabe. An extra tRNAGly(U*CU) 
found in ascidian mitochondria responsible for decoding non-universal codons AGA/AGG as 
glycine. Nucleic Acids Res, 27(12) :2554-9, 1999. 

21. G. Krauss. Biochemistry of Signal Transduction and Regulation. Wiley- VCH, Weinheim, 4 
edition, 2008. 

22. B.-O. Kiippers. Information and the origin of life. MIT Press, Cambridge/MA, 1990. (Orig- 
inally pubhshed 1986). 

23. M. J. Laforest, I. Roewer, and B. F. Lang. Mitochondrial tRNAs in the lower fungus Spizel- 
lomyces punctatus: tRNA editing and UAG 'stop' codons recognized as leucine. Nucleic Acids 
Res, 25(3):626-632, 1997. 

24. T. Lenaerts, J. Ferkinghoff-Borg, F. Stricher, L. Serrano, J. W. H. Schymkowitz, 
and F. Rousseau. Quantifying information transfer by protein domains: anal- 
ysis of the Fyn SH2 domain structure. BMC Struct Biol, 8:43, 2008. 
http : //dx ■ doi . org/10 . 1186/1472-6807-8-43 

25. A. Liang and K. Heckmann. Blepharisma uses UAA as a termination codon. Naturwis- 
senschaften, 80(5):225-226, 1993. 

26. N. M. Marinov. A detailed chemical kinetic model for high temperature ethanol oxidation. 
Int. J. Chem. Kmet, 31:183-220, 1999. 

27. E. Q. V. Martins and M. M. B. Pascoal. A new implementation of yens ranking loopless 
paths algorithm. 4OR: A Quarterly Journal of Operations Research, 1:121-133, 2003. ISSN 
1619-4500. |http : //dx . doi . org/10 . 1007/sl0288-002-0010-2, 10.1007/sl0288-002-0010-2. 

28. P. Mehta, S. Goyal, T. Long, B. L. Bassler, and N. S. Wingreen. Information processing and 
signal integration in bacterial quorum sensing. Mol Syst Biol, 5:325, 2009. 

29. J. Monod. Chance and necessity. Alfred Knopf, New York/NY, 1971. (Originally published 
1970). 

30. H. Nair, M. Allen, A. D. Anbar, and Y. L. Yung. A photochemical model of the martian 
athmosphere. Icarus, 111:124-150, 1994. 



20 



31. A. M. Nedelcu, R. W. Lee, C. Lemieux, M. W. Gray, and G. Burger. The complete mitochon- 
drial DNA sequence of Scenedesmus obliquus reflects an intermediate stage in the evolution 
of the green algal mitochondrial genome. Genome Res, 10(6):819-831, 2000. 

32. T. Ohama, S. Osawa, K. Watanabe, and T. H. Jukes. Evolution of the mitochondrial genetic 
code. IV. AAA as an asparagine codon in some animal mitochondria. J Mol Evol, 30(4): 
329-332, 1990. 

33. S. Osawa, T. Ohama, T. H. Jukes, and K. Watanabe. Evolution of the mitochondrial genetic 
code. I. origin of AGR serine and stop codons in metazoan mitochondria. J Mol Evol, 29(3): 
202-207, 1989. 

34. S. Osawa, T. H. Jukes, K. Watanabe, and A. Muto. Recent evidence for evolution of the 
genetic code. Microbiol Rev, 56(l):229-264, 1992. 

35. H. H. Pattee. The physics of symbols: bridging the epistemic cut. Biosystems, 60(l-3):5-21, 
2001. 

36. S. U. Schneider and E. J. de Groot. Sequences of two rbcS cDNA clones of Batophora oerstedii: 
structural and evolutionary considerations. Curr Genet, 20(1-2):173-175, 1991. 

37. S. U. Schneider, M. B. Leible, and X. P. Yang. Strong homology between the small subunit 
of ribulose-l,5-bisphosphate carboxylase/oxygenase of two species of Acetabularia and the 
occurrence of unusual codon usage. Mol Gen Genet, 218(3):445-452, 1989. 

38. T. D. Schneider and R. M. Stephens. Sequence logos: a new way to display consensus se- 
quences. Nucleic Acids Res, 18(20) :6097-6100, 1990. 

39. C. E. Shannon. A mathematical theory of communication. The Bell Systems Technical 
Journal, 27:379-423, 623-656, 1948. 

40. P. Speroni di Fenizio, P. Dittrich, J. Ziegler, and W. Banzhaf. Towards a theory of organiza- 
tions. In German Workshop on Artificial Life (GWAL 2000), in print, Bayreuth, 5.-7. April, 
2000, 2000. available online: http://di.ttri.ch/p/SDZB2001gwal.ps.gz. 

41. M. J. Telford, E. A. Herniou, R. B. Russell, and D. T. Littlewood. Changes in mitochondrial 
genetic codes as phylogenetic characters: two examples from the flatworms. Proc Natl Acad 
Sci USA, 97(21):11359-11364, 2000. ,http : //dx . doi . org/10~1073/pnas . 97 . 21 . 11359. 

42. T. Tlusty. Casting polymer nets to optimize noisy molecular codes. Proc Natl Acad Sci U S 
A, 105(24):8238-8243, 2008. . 

43. T. Tlusty. Rate-distortion scenario for the emergence and evolution of noisy molecular codes. 
Phys Rev Lett, 100(4):048101, 2008. 

44. S. Tsuda, S. Artmann, and K.-P. Zauner. The Phi-Bot. In A. Adamatzky and M. Komosinski, 
editors. Artificial Life models in hardware, pages 213-232. Springer, Dordrecht, 2009. 

45. T. Turnyi, K. Hughes, M. Pilling, and A. Tomlin. The Leeds 
methane oxidation mechanism. online, 2001. Version 1.5, available at 
http: / /www. chem.leeds.ac.uk/Combustion/methane. htm. 



21 



S. Yokobori, Y. Watanabe, and T. Oshima. Mitochondrial genome of Ciona savi- 
gnyi (Urochordata, Ascidiacea, Enterogona): comparison of gene arrangement and tRNA 
genes with Halocynthia roretzi mitochondrial genome. J Mol Evol, 57(5):574-587, 2003. 
http : //dx Tdoi . org/ 10 . 1007/s00239-003- 251179 



Table 8. List of all analyzed systems stating their size, density, semantic capacity, the reference of the system, and 
the method used for analysis. 
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c & 


c p 


this study 


Network from FigurellR. 


FIGIC 


6 


4 


41 


16 


1.58 


c & 


I p 


this study 


Network from Figure [Tp. 


GCMERGE 


234 


85 


n.a. 


170 


4.09 


P 




this study 


Network reconstructed from the genetic codes reported at Q 


GCFULL 


1364 


1280 


n.a. 


n.a 


18.55 


t 




this study 


Theoretical estimate of SCiog of a network, based on 




















GCMERGE, generated by inserting all possible mappings be- 




















tween codons and amino acids 


GCFULLSYNTSMALL 


16 


8 


n.a. 


200 


2.32 


P 




this study 


Network with two codons, two amino acids, and synthetases. 


GCFULLSYNT 


2,728 


2,560 


n.a. 


n.a. 


20.55 


t 




this study 


Theoretical evaluation of a reaction network containing all 




















possible mappings between the 64 codons and 20 amino acids 




















with synthetases 


MARS 


32 


104 


5,512 


> lO'^ 





c 




[30] 


Chemical processes occurring in the Martian atmosphere dur- 




















ing the daylight phase 


HYD 


10 


38 


16 


7.69 ■ 10"' 





c 




[2 


Combustion chemistry of hydrogen 


MET 


37 


340 


4,136 


> 10« 





c 




[45J 


Combustion chemistry of methane 


ETH 


57 


752 


5,136 


n.a. 





c 




[26] 


Combustion chemistry of ethanol 


DME 


79 


708 


8 


> 10^ 





c 




[18] 


Combustion chemistry of dimethyl ether 


NTOP 


16 


207 


244 


474,218 


2.81 


c & 


c p 




Artificial chemistry based on binary strings operations 


R.NTOP 


16 


207 


18.11 (s.e.=0.23) 


n.a. 


(s.e.=0) 


c 




this study 


Average on 1000 random networks of the same size and density 




















as the NTOP network. 


RANDOM 


varies 


varies 


varies 


varies 


varies 


c 8. 


c p 


this study 


Analysis of different random networks. 



Abbrev.: c - closure based algorithm, p - pathway-based algorithm, t - theoretical analysis, SCiog - logarithmized semantic capacity, n.a. - not available, s.e. - standard error of the mean 
*: determined with k = 10000. 



